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O ■ Abstract 

' For the last years, time-series mining has become a challenging issue for researchers. 

An important application lies in most monitoring purposes, which require analyzing 
large sets of time-series for learning usual patterns. Any deviation from this learned 
profile is then considered as an unexpected situation. Moreover, complex applications 
' may involve the temporal study of several heterogeneous parameters. In that paper, 

I 1 . we propose a method for mining heterogeneous multivariate time-series for learning 

^ ' meaningful patterns. The proposed approach allows for mixed time-series - containing 

O ■ both pattern and non-pattern data ~ such as for imprecise matches, outliers, stretching 

and global translating of patterns instances in time. We present the early results of 
our approach in the context of monitoring the health status of a person at home. The 
^ ' purpose is to build a behavioral profile of a person by analyzing the time variations of 

, several quantitative or qualitative parameters recorded through a provision of sensors 

' installed in the home. 

Keywords — Time-series mining. Heterogeneous Multivariate Time-series, Tem- 
poral Patterns, Unsupervised Learning, Home Health Telecare. 



Introduction 



In the last years, the increasing amount of stored data with possibly high dimensionality 
has encouraged researchers to take a great interest in discovering new patterns or build- 
ing models from large datasets, also referred to as knowledge discovery or data mining. 
5^ I Moreover, many business to scientific applications which serve mainly to support diagnosis 

and predict future behaviors effectively deal with temporal sequences [25, encouraging the 
development of the related "time-series mining" research field. 

In this work we investigate the issue of mining multidimensional and heterogeneous 
time-series for learning meaningful patterns. This is particularly useful in most monitoring 
purposes, when dealing with the detection of unusual trends or behaviors of an object or 
a situation described by the variation of data recorded from several types of sensors or 
information sources. One application is the monitoring of the health status of a person 
at home. The aim is to support the caregivers by providing information about unusual 
trends in the person's behavior observed through the variation of quantitative or qualitative 
parameters monitored at home. In that context of detecting bad trends in health status, 
we aim to learn the person's lifestyle to build a sort of profile, which is sensitive to any 
critical deviation, and then to detect any unusual behavior in comparison with this profile. 
This approach toward the decision-making issue is required because it is inconceivable to 
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describe all possible critical situations of any nature and level, just as we do not yet have any 
way of learning the occurrence of such situations (monitoring of persons getting to critical 
situations and collecting the corresponding data). A learning process is then defined to 
build a behavioral profile of the person in their activities of daily living, that is to extract 
and characterize frequent patterns from heterogeneous multivariate time-series recorded 
in usual conditions of life. The decision-making process must be able to detect unusual 
behaviors by comparison to this profile. Therefore, the pattern learning process should 
allow for heterogeneous components defining time-series, as well as for imprecise matches, 
outliers, stretching, and global translating in time of the sequences corresponding to a same 
pattern. 

Considering our context also justify the choice of extracting multidimensional patterns 
related to the person's behavior rather than analyzing individually each parameter moni- 
tored at home to make a joint decision about their condition of life. Indeed, the observable 
parameters are selected as a compromise between: (1) being easily observable and non in- 
vasive, and (2) gaining a full appreciation of the person's condition, sensitive to any change 
in the health status. Therefore, all parameters are closely related one to each other, and 
their joint variations need to be preserved in multidimensional patterns representative of 
any usual behavior. Our objective is then to build a system performing an unsupervised 
extraction of this kind of temporal patterns within time-series representative of a person's 
usual conditions of life. Our contribution lies in extending an algorithm for pattern ex- 
traction to both multidimensional and heterogeneous time-series, accounting for the large 
amount of noise possibly present in patterns' instances. We then also need to introduce a 
similarity measure appropriate to the comparison of multidimensional and heterogeneous 
time-series. 

The rest of the paper is organized as follows. In section |21 we present works related to 
time-series mining, and in section 01 our methodology for extracting frequent patterns from 
heterogeneous multivariate time-series. Then, section |1] defines an appropriate similarity 
measure, and section El details the different steps required for pattern discovery and clus- 
tering. Section El discusses the early experimental results related to the proposed approach 
in the context of home health telecare. Finally, section Q concludes the paper. 

2 Related work 

Most of the current works dealing with home health telecare are focused either on imple- 
menting a generic architecture for the integrated medical information system, on improving 
the daily life of patients using various automatic devices, specific equipment, and basic 
alarms, or on providing health care services to patients suffering from specific diseases like 
asthma, diabetes, cardiac, pulmonary, or Alzheimers. Rialle et al. have presented in |^ 
an overview of projects related to home health telecare. Basic alarms are raised by smart 
sensors or low layers of a local intelligence unit when a problem occurs at a short tem- 
poral scale: either one parameter overpasses a critical value (nocturia, pollakisuria, fall, 
hypertensive crisis, etc.), or a critical scenario involving the value of possibly more than 
one parameter is recognized (asthma crisis, etc.). Our focus is on the broadcasting of high 
level alarms about the persons health status, which concern a larger temporal scale. That 
issue is solved by first learning the daily living habits of a person to be able to detect later 
unusual situations. That behavioral profile is built by mining heterogeneous multivariate 
temporal data collected from sensors installed at home for learning meaningful patterns. 

According to Antunes et al. 0, temporal sequences are related to series of nominal 
symbols from a particular alphabet, whereas time-series concern continuous, real-valued 
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elements. In this work we are interested in heterogeneous multivariate sequences of time- 
varying data, referred to as heterogeneous multivariate either time-series or temporal se- 
quences. Time-series mining is an active field of research (see |^ for overviews of 
temporal data mining). Discovery algorithms for time-series aim at extracting important 
patterns such as similarities, trends, or periodicity, in a purpose of description or prediction 
|27j . Pattern discovery in time-series is useful for temporal sequences synthesis ^2j, as well 
as for learning tasks like association rules mining [HIE]) classification |15j . unsupervised 
clustering |32] . 

By analogy with non sequential domains and because of the exponentially large set 
of possible subsequences considering temporal sequences, time-series mining used to serve 
a learning task is sometimes referred to as "feature mining" |18| I2()j . Considering non- 
sequential domains, feature selection corresponds to finding an optimal space of size m 
from the full d-dimensional feature space, where ideally m <^ d. In sequential domains, 
"feature selection aims to select the best subset of sequential features" that is, the 
most relevant subsequences regarding the decision purpose. Time-series mining then acts 
as a preprocessor to construct the best subset of sequential features used to feed into learning 
algorithms This is particularly useful to improve learning performances when time- 

series contain both pattern and non-pattern signals, like in |T2j. According to .20!, the 
selection criteria for feature mining include that features should be frequent, distinctive 
of at least one class, and that feature sets should not contain redundant features, that is 
subsequences. 

Pattern discovery in time-series may be either (1) supervised - that is finding patterns 
described by empirical knowledge or similar to a given "query sequence" ^ El E] ~ or (2) 
unsupervised - that is finding recurrent patterns without any prior knowledge about the 
regularity of the data under study [71 1131 .12^ . Lin et al. j22j have introduced the notion 
of "time-series motifs" considering the unsupervised issue of finding previously unknown, 
frequently occurring patterns in time-series. These specific patterns are also referred to as 
"primitive shapes" jH] or "frequent temporal patterns" |14j . 

The techniques used for time-series mining vary according to the application, regarding 
the characteristics of both the temporal sequences under study and the expected patterns: 
degree of variability in the values, allowed transformations between instances of a same 
pattern, possible stretching in time. For instance. Hong et al. have experimented training 
recurrent neural networks for an unsupervised extraction of multi-temporal sequence pat- 
terns ^ni- However, this method suffers from noisy data. The use of finite state machines 
jl2j may give out good results, which may however dramatically decrease as the dimen- 
sionality increases - that is the number of states. Chiu et al. have implemented in the 
context of times-series motifs extraction an efficient algorithm based on random projections 
initially proposed by Buhler and Tompa to find motifs in nucleotide sequences jH]. Although 
they only deal with one-dimensional time-series and then do not address the issue of hetero- 
geneous multivariate time-series, this projection algorithm is actually interesting because of 
the rapid extraction of approximate results and the efficiency even in the presence of noise 
or "don't care" symbols. However this method, as implemented in does not allow for 
stretching in time between motifs instances. 

Our objective is then to extend feature mining and learning from time-series to the 
unsupervised extraction of heterogeneous multivariate time-series motifs for learning a be- 
havioral profile. We extend the use of the projection algorithm for feature mining in our 
noisy, heterogeneous and multi-dimensional context. The aim is to extract the most relevant 
features ~ that is, subsequences in time-series domain - to feed into a clustering algorithm 
for motifs identification. As an application, we focus on profiling the daily living habits of 
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a person from data recorded using a provision of sensors installed in their home. 

3 A methodology for mining heterogeneous multivariate ti- 
me-series 

3.1 Problem entity 

Pattern extraction in home health telecare 

Solving any complex decision making system requires to well specify the purpose and context 
of the decision. Dealing with several levels of details - like different levels of knowledge, 
data accuracy, decision - particularly requires to carefully set the needs, requirements and 
constraints of the system, so that the decision making matches the defined purpose at most. 
Setting up a decision making system - as motifs extraction - should then be considered as 
part of a problem solving scheme JOl , including (see figure ^ : 

1. Defining the context and the general purpose of the decision issue. This 
aims at narrowing and specifying the space of information and knowledge to consider 
by answering questions like: what are the relevant observations to set up ? or which 
level of detail to consider? or what are the performance expected for the problem 
solving ? 

2. Collecting or generating data related to that context. This data collection 
is led by contextual information related to the general purpose and context of the 
decision issue. Collecting large sets of representative data may be quite challenging in 
some applications. Setting up a simulation process is then very useful as a first 
step of setting up a decision system to prevent from the lack of experimental data. 
Simulation also allows to get a full view of data potentially recorded in the context of 
study, by varying the parameters of the simulation process, so that the performance 
of decision making systems are better evaluated. 

3. Testing appropriate methods of decision making to solve the problem. Data 
collected from experiments or generated by a simulation process are used as inputs of 
the decision-making system. The sensitivity and specificity related to these algorithms 
must match the problem requirements. 

Once a decision process has been implemented and experimented, the results of matching 
between the outputs of the decision making process and the problem requirements may 
entail to refine one or more steps in order to get better sensitivity and specificity. For 
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instance, more precision in the values of the observed parameters may be required in case 
of low sensitivity, or conversely, in case of low specificity. The problem solving scheme must 
also integrate validation at each stage, like face validation, with experts, or mathematical 
and statistical validation. 

Defining the problem entity - that is the context and purpose of the decision - have 
some consequences on both setting up the experimental context and building the decision 
making system. Considering the decision making purpose and its complexity, we can specify 
some key parameters involved at these stages of any problem solving scheme, as follows: 

• Level of knowledge available. This aims at identifying all knowledge possibly 
available and relevant to decision making, that is (a) a priori knowledge - including 
intuitive and academic knowledge - and (b) knowledge extracted from experimental 
data sets. Fusion of several kinds of knowledge is commonly used to deal with com- 
plexity and heterogeneity. The lack of knowledge related to a specific issue may also 
require to rely on various informational sources. 

• Level of details required. Specifying the level of details required to deal with 
an issue is crucial to select: (a) appropriate knowledge, and consequently the level 
of experimental data collection and representation, and (b) relevant algorithms to 
solve the problem. There is a compromise to be found between the necessity to save 
the complexity of phenomena, and the restriction to a level of detail relevant to the 
problem, that is meeting the decision's purpose. 

• Level of performance expected. The performance is defined using parameters like 
the sensitivity and specificity of the decision making system, such as an acceptable 
time to decision. 

In our context of home health telecare, the issue of extracting patterns representative of a 
person's daily behavior at home is a high level issue. The ultimate goal is not to interpret 
precisely a problem that occurred at home, but to set up the context of occurrence of any 
change in the behavior. Therefore, the pattern extraction aims at identifying recurrent 
behaviors occurring at the scale of long time intervals, from about thirty minutes to several 
hours. The "right levels" to deal with that issue are detailed in the following paragraphs: 

• Level of knowledge. Learning about daily behavioral profiles must be performed 
on individual basis, since behavioral profiles are specific of a person's physiological 
status and habits. Consequently, there is only a few a priori knowledge related to our 
decision issue. The decision-making system is then based on a set of data recorded at 
home and in real-time from a provision of sensors. In order to gain a full appreciation 
of the person's condition, sensitive to any change in the health status, data may be 
collected from different classes of sensors: (1) activity (location, posture, etc.), (2) 
environment (temperature, use of doors, window, lighting, etc.), and (3) physiology 
(heart rate, blood pressures, weight, etc.). The selection of relevant parameters is 
also constrained by many ethical, social, and individual issues: respect of privacy, 
confidentiality of data, ease of use and unobtrusiveness of input devices installed in 
the home. 

• Level of details. Dealing with a high level issue, the decision making system may 
not require a high degree of detail and accuracy as regards the data involved in the 
process, that is data collected from sensors in real-time. Moreover, the experimental 
records may require a high level of representation to highlight their global trends, 
removing minor variations that are insignificant at our observation scale. 
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• Level of performance. The decision system needs to fit the general purpose of 
monitoring: the detection of all usual patterns (sensitivity) combined with a low rate 
of false alarms (specificity) - that is identifying patterns that do not correspond to 
usual behaviors, with an acceptable time to detection. 

3.2 Experimental context 

Some guidelines for pattern extraction 

An appropriate experimental context is set according to the purpose and requirements of the 
problem. The characteristics of the data produced in that context induce some guidelines 
for pattern extraction. Looking for meaningful patterns representative of human behaviors 
- the activities of daily living of a person at home - from heterogeneous data collected from 
a provision of sensors, the decision making-system must be able to address the following 
issues: 

• Multivariate time-series. Relevance for dealing with time-varying objects or situ- 
ations described by several parameters. 

• Heterogeneous components. Capacity of handling qualitative as well as quanti- 
tative parameters in a coherent way to describe an object or a situation. 

• Mixed time-series. Ability to learn from sequences containing both pattern and 
non-pattern data. Human behaviors captured in daily life indeed contain highly casual 
as well as regular motions. 

• Imprecise matches. Capacity to discover "high-level patterns" , that is to focus on 
the global trends embedded in the data despite the strong presence of noise between 
the instances of expected patterns, especially when considering human behaviors. 

• Outliers. Capacity to preserve the detection accuracy despite the presence of outliers 
in subsequences corresponding to frequent patterns. That may be due to anomaly in 
the sensor or attributed to human failure or disruption. 

• Translation in time. Ability to detect patterns translated in time: similar behaviors 
may occur at any time. 

• Stretching in time. Ability to detect patterns of different lengths: dealing with 
human behaviors, a same activity does not always last the same duration. 

3.3 Decision-making system 
Methodology of pattern extraction 

The decision-making system aims at extracting meaningful temporal patterns from hetero- 
geneous multivariate time-series. The patterns should correspond to usual behaviors of a 
person at home. Given that activities of daily living are specific to a given subject, we 
need a completely unsupervised learning approach, that may not be driven by prior knowl- 
edge about the patterns corresponding to living habits. Unsupervised time-series mining 
is usually made up of several consecutive steps to perform coarse-to-fine feature extraction 
I12[ I13| I19j . The principle is first to roughly restrict the feature space combining tech- 
niques like time-series representation, random projections state-based temporal signal 
modelling recurrent neural nets training ^j, and then to identify more precisely the 
relevant subsequences for the learning task based for instance on specific constraints or 
similarity thresholds on the subsequences. As a consequence, this implies to define relevant 
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Figure 2: Pattern recognition system designed in the context of mining low-level time- 
series for identifying high-level patterns. 



similarity measures for the given time-series. Antunes et al. ^ define temporal data mining 
as a process including three main steps: 

• Representation of temporal sequences. Preprocessing, representation, and mod- 
elling of the data sequences that need to be applied before actual data mining opera- 
tions take place (transformation, discretization, generative models building). 

• Similarity measure for sequences. Definition of an appropriate similarity measure 
according to the characteristics of the time-series. 

• Mining operations. Application of models and representations to the actual mining 
problem (association rules mining, classification, unsupervised clustering, prediction). 

Our approach differs from the one of Antunes et al. in the way we specify each of these 
three steps. Considering a complex issue involving a large scale from the level of details 
embedded in raw data to the decision level, we need to refine the definition of representing 
and mining temporal sequences: (a) Representation needs then to be defined as a step of 
abstraction, to get from raw data a level of information that better deals with the decision 
purpose - preprocessing and feature extraction ; and (b) mining operations are divided into 
two consecutive steps - feature mining and clustering - to progressively focus on the most 
appropriate features to pattern extraction. We then propose a new general design of pattern 
recognition systems in the case of dealing with large temporal data sets, as shown on figure 
121 The following paragraphs refine in that context the three steps defined by Antunes et 
al. 

Representation 

Considering the extraction of high level patterns from time-series (see ^3.1(1 . the step of 
representation is not a simple preprocessing of data. Once represented, time-series must 
fit the level of details required by the decision system. We then define the representation as 
a step of abstraction of raw data to capture a higher level of information most appropriate to 
the decision's purpose. The aim is to get a synthetic representation of the sequential data, 
meaningful in terms of identifying the activities of daily living of a person at home. Given 
than an activity can be described as a succession of elementary "actions", each of them 
being performed for a certain duration, we aim at representing raw time-series by sequences 
of meaningful symbols, each symbol representing the person carrying out a given "action" 
for a certain time. That implies to gather along time successive data records corresponding a 
priori to a same "action", that is whose sequence present no significant temporal variations. 
Time-series are then abstracted as sequences of symbols - some multidimensional vectors, 
each of them synthesizing a "stationary" state of the monitored parameters during a certain 
period of time. 
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Mining operations 

Dealing with mixed time-series, embedding both pattern and non-pattern data subsequences 
(see ^3.2|) . we propose to divide the definition of mining operations into two consecutive 
steps to progressively better match the decision level: 

1. Feature mining. Selection of the most meaningful features (that is, subsequences), 
so called tentative motifs, considering the purpose of frequent patterns extraction and 
classification. These subsequences are used as features to feed into a classification 
algorithm. 

2. Clustering. Unsupervised classification of the tentative motifs into meaningful clas- 
ses whose representative sequences are called time-series motifs. 

In our experimental context, tentative motifs should be representative of a person's repeti- 
tive behaviors, and a motif is then defined as a meaningful class of tentative motifs, repre- 
sentative of any typical activity of the person. Because we need a completely unsupervised 
learning approach, both feature mining and clustering must be unsupervised. 

Similarity measure 

Considering our level of complexity, a similarity measure is required at the two stages of 
representing and mining time-series: 

1. Representation. The purpose of representation is to synthesize in a single sym- 
bol subsequences whose successive vectors share similar values, possibly for different 
durations, and therefore representative of a same "action" performed along the corre- 
sponding time. We then need to roughly evaluate the proximity of successive vectors 
and their relevance to be abstracted in one symbol describing a continuous same type 
of action performed. For the sake of robustness and efficiency, a discretization step is 
first of all applied to quantitative parameters. Given the fact that we are looking for 
global trends, a low approximation of the actual distance between a subsequence of 
vectors and its corresponding "mean vector" is sufficient. In case these subsequences 
are roughly similar, we can then assume the "mean vector" well represent the subse- 
quence for the corresponding duration. As a consequence, we propose to use a discrete 
minimum distance for that purpose (see i i4.,'-{j) . 

2. Mining operations. Once the possible locations of patterns have been identified 
within the original time-series, a similarity measure between these subsequences is 
required for (a) Feature mining, to decide whether or not they are effectively similar 
enough to be considered as a possible pattern, and (b) Clustering, to classify all 
these extracted subsequences into meaningful groups in terms of characterizing the 
activities of daily living. At this stage we need to compute an actual distance relevant 
to heterogeneous multivariate time-series (see ^4.2(1 . 

In that context, specifying a similarity measure then includes defining the following ele- 
ments: (1) homogeneous distance for heterogeneous components, (2) actual distance be- 
tween heterogeneous multivariate time-series, and (3) minimum distance between time- 
series. 

The next sections details, first, the definition of required similarity measures and, second, 
the proposed approach for identifying time-series motifs, including (1) representation of 
time-series, (2) feature mining, for tentative motifs discovery, and (3) clustering. 
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Figure 3: Pairs of points considered as similar and associated when computing the 
distance between time-series. 

The graphs represent the resuhs of associations when computing (A) Euclidian distance, (B & B') 
Dynamic Time Warping (DTW) distance, and (C) distance based on the longest common subse- 
quence (LCSS). 

• The comparison of points associations between A) and B) shows the better efficiency of DTW 
distance over an Euclidian distance to deal with possible distorsions in the time axis. 

• The comparison of B') and C) highlights the better efficiency of LCSS over DTW distances 
to support the presence of outliers. 



4 Similarity measure 

Various similarity models have been successfully used to compare temporal sequences, as 
illustrated on figure 01 and detailed below. The simplest approach typically used to define 
a similarity function is based on the Euclidian distance, or some extensions to support 
various transformations such as scaling or shifting. Chui et al. [7] have used it successfully 
for extracting one-dimensional time-series motifs in some specific cases. However, this model 
cannot deal with outliers and is very sensitive to small distorsions in the time axis (see figure 
01 case A)). Another approach is to use the Dynamic Time Warping (DTW) distance which 
allows stretching in time and comparing time-series of different lengths |16l I17j (see figure 
01 case B)). However, a great amount of outliers still results in very large distances, even 
though the difference may be found in only a few points (see figure 01 case B')). Non- 
metric techniques have then been introduced and efficiently used to better deal with noisy 
data |8| i32j . The idea is to capture the intuitive notion that "two sequences should be 
considered similar if they have enough non-overlapping time-ordered pairs of subsequences 
that are similar" [J. This refers to finding the Longest Common Subsequence (LCSS) 
between two time-series. This approach allows for outliers, different scaling factors, and 
baselines (see figure 01 case C)). 

However, the above works mainly deal with low dimensional (from one to three dimen- 
sional) time-series and do not address the issue of heterogeneous components (quantitative 
or qualitative) describing a moving object. Considering heterogeneous multivariate time- 
series in a particularly noisy context, our objective is then to extend the LCSS approach to 
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heterogeneous multivariate time-series. We first need to define a coherent distance between 
points whatever the type of parameter. Then we detail its integration in computing the 
actual distance between heterogeneous multivariate time-series, which is used for mining 
operations. At last, we use these definitions to extend the approach proposed in ^ for 
computing a minimum distance between time-series, which is used for time-series represen- 
tation. 



4.1 Homogeneous distance between heterogeneous components 

We would like to allow the description of an object using several parameters of the following 
possible types: 

• Quantitative 

• Ordered qualitative 

• Unordered qualitative 

The simplest way of insuring the coherence of the similarity measure is to make the distances 
between two values range from to 1 for each type of parameters. Let a and b be two 
values of a given parameter, and d{a, b) the distance between these two values. In case of 
a qualitative parameter, let v be the number of variates, the possible values being then the 
integers from 1 to v. According to the parameter's type, d{a, b) is defined as follows: 

d{a,b) = \a - b\ , (1) 

,s lo- — b\ 

d{a,b)=^- ^, 2 

V — 1 

d{a,b) = min{\a — b\ ,1). (3) 

The equations © and Q are used respectively for ordered and unordered qualitative 
parameters. In the case of quantitative parameters, getting a distance between and 1 
requires a step of normalization so that the possible values range from to 1. We use a min- 
max normalization, where the minimum and maximum bounds are defined from experts or 
using statistical analysis of training sets. All values are then restricted to these bounds, 
lower and upper values being interpreted as noisy or erroneous. Let Xmin and Xmax be 
respectively the minimum and maximum bounds for the values x of a given parameter X. 
We define the normalized value norm{x) of x as follows: 

max (0, min (x, X^ax) - X^nin) 
norm[x) = 



X'fYin^. Xf, 



4.2 Actual distance between time-series 

The similarity function between trajectories is based on the Longest Common Subsequence 
(LCSS), already used by Vlachos et al. jSU in the context of multidimensional (generally two 
or three dimensional) time-series of quantitative data. Indeed, dealing with noisy data (see 
13.21) have proved to be better handled using non-metric ^IHIES) based on the LCSS, than 
metric distances, like the Euclidean distance [Jj, or the Dynamic Time Warping (DTW) 
jl6l I17j . Using LCSS, the overall idea is to count the number of couple of points from two 
sequences A and B that matches according to a predefined matching threshold e, and when 
going through the temporal sequences (see figure EJ). One point can never be associated 
twice to a point of the other sequence, so that the maximum number of associations is the 
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Figure 4: The notion of the LCSS matching within a region of e. 

Comparing the trajectories point to point along the time axis, the pairs both within the gray region 
can be matched. 



minimum length of the two sequences. Another constant 6 controls how far in time we can 
go in order to match points from one trajectory to the other one (see figure IS)). 

We assume objects are points moving in a p-dimensional space (xi, . . . ,Xp). Let A and 
B be the two trajectories of moving objects with size n and m respectively: 

B = {{bx^,l, ■ ■ ■ , bxp,l), ■ ■ ■ , {bxi,m, ■ ■ ■ , bxp,m))- 

For a trajectory A, let Head{A) be the sequence: 

Head{A) = ((axi,i, • • • ,axp,i)' ■ ■ ■ ' (axi,n-i, • • • -.axp^n-i))- 

Given an integer 5 and a real number < e < 1, the similarity function LCSSs,e{A,B) is 
defined using the recurrent algorithm @ [21] • N and M are the size of the sequences A 
and B respectively at the first step of the recurrent algorithm. 

\i A or B is empty, 



LCSS5^,{A,B) 



1 + LCSSs,e{Head{A), Head{B)), 

(4) 



if d {ax^,n, bxk,m) < e,yi < k < p, 



and \n — m\ < 6 and \N — n — M + m\ < 5, 

max {LCSS5,e{Head{A),B),LCSSs,e{A, Head{B))) 
otherwise. 

Our similarity measure differs from the one proposed by Vlachos et al. [221 ™ two ways: (1) 
we have integrated a new temporal constraint on 5 to better control how far in time we can go 
in order to match points, starting from the end of the subsequences -\N — n — M + m\ < 5; 
and (2) we have extended the similarity measure to the consideration of heterogeneous 
parameters. The constraint on values for similarity is based on the distance between points 
defined for each type of parameters in the previous paragraph 14.11 We have also defined a 
relevant e threshold on these distances according to the parameter's type, considering that 
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Figure 5: The notion of the LCSS matching within a region of 5. 

The points of two trajectories can be matched if the time interval is under the maximum authorized 
value for 5. 



two values of a qualitative parameter are similar only if they are equal: 

• Quantitative < e < 1, 

• Ordered qualitative e = 

• Unordered qualitative e = 1. 

The number of matching is normalized by the minimum length of the two trajectories, so 
that the similarity measure ranges from to 1. Therefore the function Dj^^{A,B) between 
the two trajectories A and B is defined as follows |32j : 

min{n,m) 

Ds^e{A, B) verifies the properties of a distance. 
4.3 Minimum distance 

The minimum distance between time-series is a low approximation of the actual distance, 
which is interestingly used for getting a rough idea of the similarity between two sequences. 
Computing a minimum distance of zero between two subsequences means that they can be 
considered as quite similar. In our context, we use and interpret this information in terms of 
allowing for temporal aggregation of a sequence of vectors, when close to the corresponding 
"mean vector sequence" (see 15.1.2(1 . Chiu et al. [7^ have defined a minimum distance to 
roughly compare discrete one-dimensional quantitative temporal sequences, in a purpose 
of classification. In our context, such a distance may also be of great interest to perform 
temporal aggregation, giving an idea of whether a subsequence can be approximated by its 
mean vector or not. We then need to extend the minimum distance of Chiu et al. to 
allow for heterogeneous multivariate time-series. 

The definition uses the values of breakpoints defining the discrete intervals of values. 
Let B = (3i, (3a-i be the sorted list of breakpoints for a given quantitative parameter 
discretized in a symbols ai,...,aa {Po and are defined as — oo et +oo respectively). A 
sequence C = ci,...,Cn of length n can be transformed into a symbolic representation as 
a word C = ci,...,c^ where q = aj iff < q < I3j. Using the principle of Euclidean 
distance, the minimum distance between the original time-series Q and C of two words Q 
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and so called mindist{Q, C), is defined by the following equation [7|: 



mindistiQ, C) = - 



\ i=i 



The distance function d[ai, aj) between two symbols Oj and aj of a given ordered alphabet 
corresponding to discretization intervals, 1 < i,j < a, is defined using the values of the 
corresponding breakpoints, as follows [Zj: 

\ I if N - j| < 1 , /r^ 

^ [ Pmax{i,j)-i - Prain{i,j) Otherwise. 

The implementation of a lookup table used to define the distance between words made up 
of such symbols is illustrated in tabled 
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Table 1: Distance between discrete symbols representing time-series. 

Lookup table used to compute the minimum distance between two words for an alphabet of cardinal- 
ity 4, Qfi, ...a4, defined by discretization using the breakpoints /?i = 0.12, (ii = 0.37, and (i^ = 0.69. 
The distance between two symbols can be read off by examining the corresponding row and column. 
For instance d(Q;i,Q;2) — and d(a\^aj^ = 0.25. 

To extend this notion of minimum distance to multidimensional heterogeneous time- 
series, we use the distance function between points defined in section 14.11 for qualitative 
parameters. The implementation of a lookup table used to define the distance between 
sequence of qualitative symbols is illustrated respectively in tables [21 and El 
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Table 2: Distance between symbols from an unordered alphabet. 

Lookup table used to compute the minimum distance between two words for an unordered alphabet 
of cardinality 4, ai, ...04. 



Let C = ((ci, 

1) •••5 ci^p), {c.n,\^ •••) Cn,p)) be a p-dimensional heterogeneous time-series repre- 

)). Using the relevant 



..,c 



sented by the sequence of symbols C = ((ci_i, ci^p), (c„^i, 
function d{qij, Cij) according to the type of component j - equation ((3), Q, or © using the 
normalized quantitative values - the minimum distance between two original p-dimensional 
time-series Q and C represented as Q and C, so called mindist{Q, C), is then re-defined by 
the following equation: 



mindist{Q, C) 



\ 



i=l \j=l 
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as 
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Table 3: Distance between symbols from an ordered alphabet. 

Lookup table used to compute the minimum distance between two words for an ordered alphabet 
of cardinality 3, ai < a2 < a^. 

5 Proposed approach for pattern extraction 

In that section we describe the proposed approach for recurrent pattern extraction. The 
schema of figure El summarize the successive steps identified in section VA.IM for an unsuper- 
vised learning of meaningful "high-level patterns" from heterogeneous multivariate time- 
series, detailed in the following paragraphs: ( ^5.1|) Representation of time-series, ( ^5.2|) 
Feature mining for tentative motifs discovery, and ( H5.3|) Clustering for time-series mo- 
tifs final identification. 

5.1 Representation of time-series 

Time-series representation is really important because of the difficulty of directly manipu- 
lating continuous, and especially heterogeneous, high-dimensional and possibly noisy data 
in an efficient way. Defining a suitable representation aims at reducing feature space di- 
mension in order to get an efficient feature mining for running the learning task. Many 
time-series representation have been introduced, including the Discrete Fourier Transform 
(DFT), the Discrete Wavelet Transform (DWT), Piecewise Linear and Piecewise Constant 
models (PAA, APCA), Singular Value Decomposition (SVD) (see j2| for an overview). 

Since we are looking for "high-level patterns" (see W6.2\i within the time-series - that is, 
patterns corresponding to usual activities of a person at home - our purpose of representa- 
tion is to highlight the global trends within the data, while removing minor local variations. 
Our main concern is in fact robustness rather than accuracy of the extracted patterns. In 
that aim, we also try to restrict as much as possible the number of parameters involved in 
the process. Defining the step of representation is then guided by this purpose of getting 
a long-term, simple, and meaningful view of the time-series, which corresponds in fact to a 
step of abstraction. 

We perform time-series abstraction in three steps to get a concise representation of the 
heterogeneous multivariate time-series under study: (1) preprocessing, (2) discretization, 
and (3) temporal aggregation. Preprocessing the time-series includes filtering, temporal 
reduction and alignment . Although well known and usual when analyzing data sets, this 
step is also really important because it at least partly governs the level of details of the 
analysis. The next subsections detail the two following steps of discretization and temporal 
aggregation. The figure [3 illustrates the results of each of these steps from a sample sequence 
simulated in the contexte of home health telecare. 

5.1.1 Discretization 

Dealing with heterogeneous time-series, using a symbolic or discrete representation of the 
lower-level data is interesting to build homogeneous data sets for feature mining. This 
requires the discretization of the continuous components of time-series. Several methods 
have already been experimented like Piecewise Aggregate Approximation (PAA) to produce 
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Figure 6: Successive steps of motifs extraction from data collected from sensors, 

including: (1) Representation, (2) Feature mining, and (3) Clustering. 



symbols of equiprobability [Tj , clustering using k-means |H] or Dynamic Local K-means 
(DLK) - DLK learns the number of the classes with subject to the constraint that the 
variance of each class is less than a given sigma-zero jl2| . Given that equiprobability in 
symbols is not necessarily relevant considering monitoring purposes - unusual values need to 
be distinguishable from usual ones - we use the standard k-means technique on experimental 
data sets to define the discretization intervals for quantitative parameters. 



5.1.2 Temporal aggregation 

Once we get discrete time-series, a further step in reducing the feature space dimension is 
to perform temporal aggregation, where aggregate vectors - either called symbols - are 
computed over time-line partitions. The main interest is to get a concise representation of 
time-series, allowing stretching in time between subsequences represented by a same number 
of symbols, and possibly similar in terms of this aggregated representation. In general, 
temporal grouping is done by two types of partitioning [221 : (1) span grouping, based on a 
defined length in time, and (2) instant grouping, which depends on the data stored. Various 
techniques have already been proposed and applied to issues where several time-series of 
same parameters are recorded during overlapping time-intervals |241 126j . The aim is then to 
summarize the time variations over only one possibly multidimensional sequence of values. 

In our context, the issue is quite simpler because it only aims at partitioning one mul- 
tidimensional sequence of discrete values into a time-stamped sequence of vectors which 
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Figure 7: Abstraction steps performed before analyzing time-series. 

Graphs representing some sequences of 4-dimensional time-series for a person monitored at liome 
over one day, including for each graph the following parameters (from top to bottom): (1) moves, 
(2) postures, (3) activity levels, and (4) mean heart rate. 

From left to right, and from top to bottom, the graphs represent: (a) Raw data, produced by a 
simulation process; (b) Preprocessed data, that raw data smoothed by a mean filter, following 
by temporal reduction, and normalization; (c) Discretized data, the discretized intervals being 
defined by the k-means technique, and (d) Aggregated data. 



summarizes the global trends of variation. Span aggregation may be performed while 
pre-processing the time-series using sliding windows of fixed-length to mean the data and 
possibly reduce the sampling rate in the same time. The choice of sampling rate is impor- 
tant to determine the precision and thus the interesting level of details of time-series. A 
more challenging and also common issue is instant aggregation, depending on the varia- 
tion of the values in time. Since our interest is in observing global trends in the time-series, 
we need to compute aggregate vectors within time-intervals where there are no significant 
variations in the multidimensional values (that is, vectors). We use a technique based on 
a distance threshold using an extension of the minimum distance between time-series pro- 
posed in IjTj . The great interest in using a minimum distance (see t|4.3|l is the ease of getting 
a low approximation of the actual distance, which can be intuitively interpreted in terms 
of relative global trends of variation between time-series. In order that the aggregation of 
a sequence into one time-stamped vector is performed only for successive similar vectors, 
we decide that the minimum distance between the original and aggregated sequences must 
not be over zero for allowing aggregation. Considering the heterogeneous case, a minimum 
distance of zero means values of quantitative parameters are similar along time ~ that is 
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within adjacent discretized intervals - and values of qualitative parameters are the same. 
The aggregate vector aggr{C) of a p-dimensional sequence of length n, 

is defined as the discretized mean value computed along time for each component: 

aggr{C) = {mean (ci,i, ...,Cn,i), mean (ci,p, Cn,p)). 

The mean value of a symbolic sequence corresponds to the most represented symbol within 
the whole subsequence. The following equation express the condition required for aggregat- 
ing the vectors of a subsequence C : 

mindist{C,AGGR{C)) = 0. 

C is the discretized sequence of C; AGGR{C) is a sequence of same length n made up of 
the repeated aggregate vector, 

AGGR{C) = iag^{G),...,ag^iG)y, 

and mindisti) is the minimum distance whose definition is extended from the one proposed 
by from Chiu et al. j7]. Starting from the first point of the sequence, we look for the longest 
time-intervals where temporal aggregation is allowed according to the previous definitions. 
At the end, the original time-series is then represented by a sequence of multidimensional 
vectors, either called symbols, each of them lasting for a specific duration. 

5.2 Feature mining: tentative motifs discovery 

Feature mining aims at selecting the most relevant features to feed into a learning task, so 
that we reduce the size of the feature space. In the context of time-series motifs discovery, 
the purpose is to extract the most relevant subsequences - the tentative motifs - used as 
input for the final identification and classification of frequent patterns - the motifs. 

Several papers dealing with motifs extraction use a kind of feature mining step to first 
select the potential location of frequent patterns within the time-series, and then refine the 
motifs identification. For instance. Hong et al. ^H] have trained recurrent neural nets to 
extract temporal patterns candidates. They correspond to subsequences where the trained 
network can continuously give out correct one-step prediction. In that paper we extend the 
probabilistic approach experimented in ^ for tentative motifs extraction, and illustrated 
on figure IHl Time-series motifs candidates - that is tentative motifs - are identified from 
random projections of all the possible subsequences extracted using a sliding window from 
the original and symbolized time-series. The tentative motifs correspond to subsequences 
that are often hashed into the same bucket using a mask randomly chosen. Each step of 
projection increases the counts in a collision matrix., a square matrix whose size corresponds 
to the number of all possible subsequences of a predefined length. A large value of collisions 
is a strong indicator of two similar subsequences, that is good candidates for motifs. Since 
time-series are discretized into symbols of constant frequency, this method does not allow 
for stretching in time between motifs instances. Moreover, Chiu et al. have implemented 
the projection algorithm 3 only for one-dimensional real time-series. 

Actually, the projection algorithm is interesting because of its ability to roughly identify 
possible instances of motifs in time-series, allowing some noise and imprecision in the discrete 
sequences representing the original time-series. In our complex issue involving several levels 
of details from raw data to decision, this algorithm then acts as a stage of feature meaning 
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Figure 8: Principle of using the projection algorithm as experimented in 

The successive steps are as follows : 

(1) A sliding window is defined to extract subsequences Ci from the initial temporal sequence ; 

(2) Each sequence Ci is converted into its discrete representation Ci and placed into matrix S ; 

(3) A mask is randomly chosen, so that only part of the discrete values were used to project the 
matrix S into buckets. Collisions are recorded by incrementing the appropriate location in 
the collision matrix. 



to extract the best subsequences that are candidates to motifs from the original time-series 
represented as a discrete sequence of symbols. We then extend this approach to deal with 
heterogeneous multivariate time-series. The context of using the projection algorithm is 
also changed so that we can discover motifs whose instances are of different lengths. Using 
as input discrete sequences obtained with representation techniques including temporal 
aggregation of symbols (see ^5.1.2j) addresses this issue. 

Using the projection algorithm allows to extract subsequences representative of frequent 
patterns within discrete time-series. This criterion is however not enough to deal with 
feature mining - that is tentative motifs extraction. According to Lesh et al. the 
criteria for selecting features might depend on the domain and the classifier being use. 
However, they believe that the following domain- and classifier-independent heuristics are 
useful for selecting sequences to serve features: (1) Features should be frequent, (2) Features 
should be distinctive of at least one class, and (3) Feature sets should not contain redundant 
features. 

(1) The first heuristic is clearly insured because subsequences extracted using the projec- 
tion algorithm at least partially matches another subsequence due to their extraction 
from the collision matrix produced by the successive projections. 

(2) The second one cannot be encoded directly from projections because this approach 
to pattern extraction is rough and unsupervised. Having features distinctive of at 
least one class is then ensured by three additional steps when examining the collision 
matrix. First, because a large value of collisions is only a strong indicator of simi- 
lar subsequences, we go back to the original, preprocessed, time-series to refine the 
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comparison between pairs of subsequences. Second, because a motifs' instances may 
be of different lengths regarding the number of corresponding discrete symbols, each 
instance may be represented by several successive basic subsequences. Consequently, 
we propose to extend matching pairs of subsequences while examining the collision 
matrix, as long as they are similar, in order to match the whole patterns. Third, 
we add a constraint on the minimum duration of so extracted subsequences to insure 
that features are relevant in terms of the person's behavioral profile, and consequently 
probably in terms of being an instance of a motif. 

(3) At last, the third heuristic requires synthesizing the set of subsequences identified 
from the previous steps in order to get a set of non-overlapping features - that is, 
subsequences - that are the most representative of each group of overlapping subse- 
quences. The need for synthesizing the results of collision matrix examination has not 
been put forward in related papers. This stage then ends feature mining, that is the 
tentative motifs extraction, with respect to the third heuristics of Lesh et al. |2nj . 

Feature mining then includes three main steps, detailed in the next subsections: 

1. Time-series random projections, once represented using discretization and aggre- 
gation techniques, 

2. Collision matrix examination to extract frequent and relevant subsequences in 
terms of identifying motifs, and 

3. Tentative motifs extraction by identifying the most relevant, non-overlapping, 
subsequences from the previous set. 



5.2.1 Time-series random projections 

Time-series random projections produce a collisions matrix recording integers representa- 
tive of the number of matches between all the possible subsequences extracted from the 
original sequence. A large value in a cell is not the guarantee of the existence of a corre- 
sponding motif, but it is a strong indicator [Jj. The diagram of figure IHl sums up random 
projections principle, according to the following steps: 

(1) Preprocessing. Let C be a p-dimensional sequence of n values recorded regularly 
over time: 

C = ((ci,l 5 •■•) Ci^p), (Cn,l , Cn,p)). 

(2) Abstraction. Time-series are first represented using the techniques presented in 
section 15.11 - preprocessing, discretization, and temporal aggregation. Sequence C 
is then represented by a time-stamped sequence of symbols - the p-dimensional 
vectors ^i,p), 1 <i < N , N <n: 

C = {{qN,i,-,qN,p)),tN), 
where (ii, ...,iAr) are the ordered instants of symbols occurrence along time. 

(3) Random projections 

(a) Basic subsequences. Random projections are performed from so called basic 
subsequences of a specified length extracted from the original sequence using 
a sliding window of size w. This produce a matrix S of size {N — w + 1) x w. 
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Figure 9: Principle of feature mining, once original time-series are represented. 



(b) Projection mask. We randomly select Wmask columns of S to act as a mask. 
In a p-dimensional context we need a random mask of size Wmask ^ Pmask i where 
Wmask and Pmask are integers such as < Wmask < w and < Pmask < P- For 
example in figure EJ where w = 4 and p = 4, we have randomly selected column 
number 4 {wmask = 1) to act as a mask on symbols, and parameter number 3 
for symbols 1 and 2, such as parameter number 2 for symbol 3 {pmask = !)• 

(c) Collision matrix. The [N — w + 1) words in the S matrix are hashed into 
buckets based on their non-masked values. In the sample of figure El all possible 
couple of subsequences are compared based on their l"*^, 2"*^, and 3""^ symbols, 
considering the l'^*, 2"°' and 4*^ parameters for symbols 1 and 2, and the 1**, 3'''^ 
and 4*^ parameters for symbol 3. This produces the collision matrix P of size 
{N — w + 1) X {N — w + 1), built as follows: if two words corresponding to p- 
dimensional subsequences i and j are hashed to the same bucket, we increase the 
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count of cell in P, previously initialized to all zeros - P{i,j) = P{i,j) + 1. 



We need to repeat the two last steps (b) and (c) an appropriate number of times so 
that the collision numbers are statistically significant. The relevance of the collision 
numbers also depends on the parameters being well selected according to the purpose 
of motifs extraction. 

The key parameters of projection are as follows: (1) number of vectors per subsequence 
(w), (2) number of vectors defining the first dimension of the projection mask (wmask), (3) 
number of parameters defining the second dimension of the projection mask (pmask), (4) 
number of projections performed (proj). 

5.2.2 Collision matrix examination 

Once the collision matrix is significantly filled in, we examine iteratively its values from the 
largest one to find promising candidates for motifs extraction. We stop the examination 
when the next value not already examined, and not within the scope of previously reported 
tentative motifs, is lower than a predefined threshold defining the minimum "large enough" 
number of collisions, so called the collision threshold. A large value of collisions is only 
a strong indicator of two similar subsequences, and we go back to the original data to 
possibly confirm we met a tentative motif. The comparison of the original, preprocessed, 
subsequences corresponding to tentative motifs is performed from the similarity measure 
defined in sectionl^for heterogeneous multivariate time-series. A threshold on this measure, 
so called the distance threshold, is used to decide whether or not two subsequences can 
be considered similar enough so that they define tentative motifs. 

In order to look for the whole and significative patterns, independently of the number 
w of symbols defining each basic subsequence used as input for projections, we perform 
"pattern growing" - as called in in another methodological context - while examining 
the collisions matrix. Considering a pair of similar discrete basic subsequences in terms 
of collision number and actual distance, we define "pattern growing" as the consideration 
of extended subsequences including the basic ones. We try to extend them on their right 
and left sides - that is before and after the respectively first and last symbols in time — 
while (1) the numbers of collisions corresponding to the extended area is still large enough - 
that is over the collision threshold, and (2) the similarity between the extended original 
subsequences does not overpass the distance threshold. 

Moreover, we define some neighbourhoods of collisions to allow for noise between re- 
current subsequences and imprecision in the abstraction step. Indeed, in that noisy and 
imprecise context, all basic subsequences that made up a whole motif instance do not gen- 
erate collision values over the predefined threshold when compared to other instances. An 
adaptive algorithm is then defined from the observation of a high collisions number, as illus- 
trated on figure irni : (1) A finest identification of recurrent basic subsequences is performed 
by finding out the lowest actual distance, under the maximum threshold, between pairs 
of basic subsequences in a close collision neighbourhood that verifies the minimum colli- 
sions criteria ; (2) Pattern growing is then performed when the collision criteria is verified 
in the neighbourhood of possible extensions and the actual distance between the extended 
subsequences is under the maximum threshold. 

We then produce a group of subsequences of different lengths in terms of symbols and/ 
or number of points regarding the original subsequence. 
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Figure 10: Neighbourhoods of collisions used to identify and extend basic recurrent 
subsequences from the collision matrix. 



5.2.3 Tentative motifs extraction 

Feature mining aims at identifying non-redundant features [10], that is a set of non- 
overlapping subsequences from the whole original sequence that are the most appropriate 
to motifs extraction. However, the group of subsequences previously extracted from the 
collision matrix examination may contain overlapping ones because they are extracted by 
pairs regarding an area of high values within the collisions matrix. We then need to identify 
relevant groups of subsequences that are well-separated in time, so that we can define at 
last the tentative motifs, one corresponding to each group. 

Considering a group of k subsequences, one overlapping all the other ones of the group, 
and each of them ranging from indexes tj^i to tj^nj (1 ^ i < k, rij > 1), where tj^i < tj^Uj 
regarding the original sequence, the tentative motif representative of this group is defined 
by the subsequence ranging from indexes to t-^, where: 

f = min ({ti,i}i<j<fc) and t^ = max {{h,n,]^^.^^ ■ (6) 

The idea is indeed to consider the collision and distance thresholds as restrictive enough so 
that pattern identification and growing from the collision matrix examination only results 
in defining significant tentative motifs. However, a tentative motif met several times as 
matching different subsequences may not be extended enough any time because of possible 
noisy data or imprecision in frequent subsequences identification. The largest subsequence 
regarding all overlapping subsequences must then be considered as the tentative motif. 

However, frequent subsequences identified following the collision matrix examination 
may not be directly divided into well-separated groups of subsequences, where each group 
contains subsequences overlapping all the other ones. For instance, a subsequence corre- 
sponding to a large collision number regarding another subsequence might be hazardously 
too much extended in reference to the effective location of the corresponding motif. Regard- 
ing the group of overlapping subsequences containing this "too long" subsequence, removing 
this subsequence may result in the corresponding group being in fact possibly divided into 
two groups of well-separated subsequences. The best way of defining the corresponding ten- 



22 



1=1 Symbol Subsequence 

C 3 Classe definitive et valide de sous -sequences 



Subsequence that do not 
venfy cntena of a class 



(L) Valid class 



(2) Invalid class contaiiig (3) Other invalid class 
at least two elements 













Tentative moti& 






1 


1 2 



Figure 11: Illustration of possible cases when examining the validity of a class of tenta- 
tive motifs. 
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Figure 12: Illustration of possible divisions of an invalid class containing at least three 
subsequences. 

A given class is considered as invalid when at least two subsequences do not overlapp, so called ki 
and k2- 



tative motif is then to remove the consideration of the longest subsequence and to identify 
at the end two tentative motifs corresponding to each well-separated group of subsequences. 
Possible cases met when examining a group of tentative motifs are illustrated on figure 1111 

Thus, defining the most relevant tentative motifs considering the results of the collision 
matrix examination raises an issue of clustering sets of subsequences. Within each set, one 
subsequence overlaps at least one other subsequence belonging to the same set, and no sub- 
sequence from any other set. Clustering is an unsupervised data analysis technique which 
searches to separate data items, having similar characteristics, in constituent groups. The 
most common clustering methods are partitioning, hierarchical agglomerative or hierarchi- 
cal divisive ones jSj. Agglomerative techniques start usually with single member clusters, 
whereas divisive methods begin with all cases in one large cluster. The divisive algorithm 
then subdivides it until some tests are satisfied. In theory, these could continue until there 
are t clusters each containing one object, but in practice they usually stop at an earlier 
stage. Divisive methods are however more expensive that agglomerative ones. 

We use a hierarchical divisive clustering approach to decide about the best tentative 
motifs to be defined from a group of overlapping subsequences extracted at the end of ex- 
amining the collision matrix. Criteria available to get clusters are indeed appropriate to 
that type of iterative algorithm, starting from one cluster that is gradually broken down 
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into smaller and smaller clusters [H]. At each iteration the next class to be divided is chosen, 
and the process is repeated until a given stop criteria is verified. The crucial elements to 
be defined are as follows: (1) the criteria to select the next class to be divided, (2) the 
method for dividing this class, and (3) the criteria to stop the successive divisions. The 
groups of subsequences considered as to be potentially divided are made up from the set of 
subsequences extracted when examining the collision matrix, as follows: any subsequence 
overlaps (a) at least one other subsequence from its group, and (b)no subsequence from 
any other group. The purpose of clustering the subsequences of each group is: any sub- 
sequence overlaps (a) all other subsequences of its group, and (b) no subsequence from 
any other group. The previous purpose defines the stop criteria for dividing the groups of 
subsequences. If a group of subsequences does not satisfy these constraints, that means 
some subsequences are not relevant to be considered and need to be removed from the set 
of frequent subsequences. 

1. Criteria of selecting the class to be divided. The selection of the groups to be 
next divided depends on the subsequences which does not overlap within the initial 
group. The criteria used to choose the "best" division to be performed - that is 
also the "best" subsequence to be removed from the group so that all subsequences 
overlap each other - is to end the divisions with the best representative groups of 
subsequences in terms of their well-representation of similar trends in the time-series. 
That is interpreted as removing the lower number of subsequences from the initial set. 
The "best" division is then determined a posteriori considering all possible divisions. 
In order to prevent the algorithm from getting to an exponential running time, some 
optimization criteria are used to drive a priori the selection of the best division, and 
to define what is an "acceptable" division - that is an acceptable rate of subsequences 
removed. 

2. Method of dividing a class. At each step of dividing a class, three cases are 
possible, as follows: (1) either every subsequence overlaps all the other subsequences 
of the class, so that no division is required; (2) or there is only two subsequences that 
do not overlap within the class, so that the division consists in building two classes, 
containing one subsequence each; (3) or the class contains at least three subsequences, 
including at least two subsequences that do not overlap. In that third case, one ore 
more subsequences need to be removed from the class so that we can get to satisfy 
the criteria required to stop the divisions. Let ki and k2 be two non-overlapping 
subsequences of a class. The division can be performed in the following manners 
(see figure [T2]) : (a) either removing ki and /c2, removing other subsequences from the 
group, (b) or removing ki, (c) or removing /c2, (d) or keeping ki and k2. Removing 
the consideration of one or more subsequences might entail that the original group is 
divided into well-separated groups in time. We then apply the division algorithm on 
the corresponding new group(s) of subsequences. 

3. Criteria to stop the successive divisions. We stop this recursive process of 
division when the purpose of clustering the subsequences of a group is reached, that 
is: any subsequence overlaps (a) all other subsequences of its group, and (b) no 
subsequence from any other group. 

Some results of identifying tentative motifs from the set of subsequences extracted from the 
collision matrix are presented on figure El 
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Figure 13: Tentative motifs extraction. 
Successive steps to identify tentative motifs from the set of frequent subsequences identified after 
collision matrix examination, using a divisive clustering method. 
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Figure 14: Ascending Hierarchical Classification. 

The tentative motifs numbered from (a) to (e) arc classified into two classes ({a,b,c} and {d,e}) 
according to the initial distance matrix Dq and the distance threshold dmax = 0.4. The corresponding 
subsequences are displayed on the bottom graphs. They have been generated by a simulation process 
in the context of home health telecare, including four parameters : (1) the moves of a monitored 
person, (2) their postures, (3) the activity level, and (4) the mean heart rate. 
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Figure 15: Computation of the mean representative sequence of a class. 

The figure above shows how to get the representant of the three one-dimensional sequences of a 
given class. The computation is performed from the reference sequence, that is the one whose length 
is closest to the mean sequences length. Each of its points is meant with similar points from the 
other sequences of the class. 



5.3 Clustering: time-series motifs identification 

The last step is the clustering of tentative motifs into classes representative of any typical 
"behavior". Since the last step, we need a classification method based on an accurate dis- 
tance measure, that is an actual distance between the original, preprocessed, subsequences 
corresponding to the identified tentative motifs. We then use the similarity measure de- 
fined in section ^ for heterogeneous multivariate time-series. Our purpose is to cluster 
subsequences in groups whose elements are close to all the other ones belonging to the same 
group, that means the distances are less than a given distance threshold. Then, we use a 
hierarchical ascending classification from the distance table between all the tentative 
motifs, as illustrated on figure ITU This is an agglomerative technique which starts with sin- 
gle member clusters - the tentative motifs - successively gathered into classes according to 
a distance threshold. For the sake of homogeneity and robustness, we use the same distance 
threshold than when examining the collision matrix. The distance between two classes is 
defined as the maximum distance observed between all possible pairs of subsequences, one 
from each class. This ensures we never gather classes containing subsequences whose dis- 
tance overpass the distance threshold. An additional constraint on the size of any class of 
tentative motif is also added when clustering tentative motifs to reinforce the relevance of 
extracted patterns. 

Once the tentative motifs are clustered into meaningful groups, a mean representative 
subsequence corresponding to each class - that is, motif - is computed, as illustrated in fig- 
ure El This representative sequence is based on the so called reference subsequence whose 
duration is the closest to the mean duration observed considering all the subsequences of a 
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group. Since based on LCSS (see computing a similarity measure between any sub- 

sequence of a group and the reference one provides sets of similar vectors associated to each 
vector of the reference subsequence. The representative subsequence is then defined of same 
length than the reference sequence, replacing each vector by its mean vector considering 
the set of associated similar ones. 

6 Experimental Results 

The approach proposed for extracting multidimensional and heterogeneous patterns is ex- 
perimented in the context of home health telecare. In that section, we first define the 
experimental context, appropriate to the experimentation of pattern extraction. Testing 
our approach also requires to define an experimental process, such as relevant measures for 
evaluating the system's performances. At last, we can discuss the quality of the method 
and results. 

6.1 Experimental context: home health telecare 

In the purpose of monitoring a person at home, the aim is to learn the person's lifestyle 
in order to build a sort of behavioral profile, which is sensitive to any critical deviation. 
The monitoring system is based on a set of data, recorded at home and in real-time, that 
may be collected from different classes of sensors: (1) activity (location, position, motion, 
fall, etc.), (2) environment (temperature, use of doors, window, lighting, etc.), and (3) 
physiology (blood pressures, weight, etc.). In the definition of these observable parameters, 
a compromise needs to be found between (a) being easily observable and non invasive, by 
focusing on the observation of a small set of parameters, and (b) gaining a full appreciation 
of the person's condition, sensitive to any change in the health status. A deterioration 
of a person's health status usually entails behavioral disorders whose observable symptoms 
range from an increase in the risk of falls, slowness in executing simple actions, forgetfulness 
in daily activities, to a global decrease in the person's ability to perform activities of daily 
living (ADL). Clinical practice has already widely exploited this correlation by estimating 
a patient's health status in terms of their ability to perform ADL such as getting washed, 
dressing, or feeding themselves. The usefulness of monitoring some parameters related to 
the activity of a person is often underlined as being an essential part of any health evaluation 
[UEHl) and several projects in home health telecare j2S | IHl] have already integrated in 
their concept the assessment of the ADL. Representative of both the activity and the health 
status, the heart rate is another important and easily observable physiological measure j25j . 

Thus, we decided to consider in a first step of experimentation four parameters that can 
be defined from a provision of sensors and that are representative of both the heart rate 
and activity of a person at home, as follows: 

• Moves: qualitative, unordered parameter, defining the room occupied by the person 
at any time. The moves of the person are recorded through infrared motion sensors 
installed in each room. 

• Postures: qualitative parameter, ordered according to the effort required by the 
posture ("lying down", "sitting", and "standing"). The postures are inferred from 
data provided by a set of accelerometers worn by the person. 

• Activity level: quantitative parameter, in an arbitrary unit. The activity level is 
measured by a portable accelerometer worn on the chest and estimated through the 
body acceleration along the anterior-posterior axis 
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• Mean heart rate: quantitative parameter, in beat per minute. The mean heart rate 
is computed from the data recorded by an ECG portable recorder. 

Thus, we consider a set of heterogeneous parameters for mining meaningful patterns repre- 
sentative of a person's usual behaviors. 

6.2 Experimental process 

Setting up an experimental process requires to properly define (1) which data sets are 
appropriate to the context and purpose of the decision-making system, (2) which method 
are required to build a full experimentation, and (3) which performance measures are 
the most relevant to objectively evaluate the robustness and efficiency of the system. 

6.2.1 Experimental data sets 

Collecting experimental sets from a simulation process 

The study of any decision-making process requires realistic and accurate data collection. 
Research projects about home health telecare are as yet only at their first stages of de- 
velopment, and collection of data in realistic environments has just started. Moreover, a 
full study entails consideration of several profiles of people facing many types of situations. 
Then, collecting complete and representative sets of data may be a quite hard task, es- 
pecially to hold data corresponding to rare events. For these reasons, many researchers 
have turned to simulation as a way to overcome the difficulty of collecting large sets of 
full experimental data sets. In relation to experimentation, setting up a simulation process 
enables researchers to have a full and tightly controlled universe of data set, by varying the 
simulation parameters. 

For these reasons, we have set up a simulation process for generating realistic sequences 
corresponding to the experimental parameters JJ: moves, postures, activity level, and 
mean heart rate. The simulation process is designed to preserve the problem's complexity, 
that is especially the joint variations of the parameters. The sequences produced by sim- 
ulation must also be representative of a person's habits at home, that means they include 
every day subsequences corresponding to the presence of basic activities of daily living like 
sleeping, eating three times a day, getting washed. Considering these requirements and 
relative influences of simulated parameters, the simulation model is defined using a cascade 
structure, and run in four steps to successively generate time-series corresponding to: (1) 
the moves of the subject in a given period of time, (2) their successive postures, (3) the 
sequences of the activity levels, and (4) the values of the mean heart rate ^]. A sample of 
data produced by the simulation process over one day is shown on flgure lTBl Our objective 
is then to identify from these sequences of data "high level patterns" corresponding to usual 
behaviors of the person at home, especially under the presence of noise. 

Defining appropriate sequences for experimenting pattern extraction 

In order to validate our approach to pattern extraction, we need to objectively check the 
relevance of the motifs extracted from time-series. That requires to know a priori which 
motifs are represented within the time-series and the location of their instances, so that 
we can evaluate the performances of the system. The aim is then to introduce instances of 
predefined motifs in sequences containing at first no pattern. Due to the simulation method, 
the way of getting time-series containing no significant patterns while remaining realistic in 
terms of the joint variations of the parameters is described as follows: 
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Figure 16: Sequences produced by the simulation process over one day, 

including four parameters that are, from top to bottom: (1) moves, (2) postures, (3) activity levels, 
(4) mean heart rate. Data are available every minute. 



• First, generation of random moves, so that data include no pattern representative of 
the person's activities of daily living. 

• Second, simulation of relevant sequences of values for the other parameters accord- 
ing to these random moves, so that the multidimensional sequences remain realistic 
anyway. 

We also need to identify realistic motifs considering the monitoring of a person at home. 
This time, the simulation process is used to generate sequences corresponding to living 
habits, in which we can randomly select subsequences that can be interpreted in terms of 
the person performing a typical activity for a given time. The length of subsequences is 
randomly selected in meaningful bounds (from 30 minutes to 2 hours for instance). We 
also add a constraint on the selected subsequences to ensure they can be interpreted as the 
person performing a given activity, and not only one or two elementary action. Selected 
subsequences must then be represented - using the representation step defined for motifs 
extraction (see ^5.1j) - by a sequence of at least 4 symbols. 

The last step is then to create and introduce some instances of these predefined motifs 
in the non-pattern sequences previously generated. The introduction of motifs' instances 
is guided by specific characteristics of patterns identified in our experimental context (see 
N3.2() . We especially need to take account of possible imprecise matches, outliers, translation 
and stretching in time between instances of a same motif. Therefore, we define several types 
of noise that can be introduced in a subsequence representing a motif to get instances of 
the same motif, as follows: 

• Noisy values. Given that we consider sequences representative of human behaviors, 
instances of a same motif fit only in their global trends. Then, we may define a high 
rate of noise in the values. 

• Interruptions. Any main activity of daily living may be interrupted by a secondary 
task like answering to the phone or going to the toilets. As much as this additional 
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task remains of short duration, we would like to recognize the main task anyway. 
That requires to experiment the algorithm using instances of motifs including short 
sequences of outliers, representing disruptions in the global trend of variation. 

• Stretching in time. Dealing with human behaviors, a same activity docs not always 
last the same duration. That is the reason why we may introduce instances of different 
lengths corresponding to a same motif. 

• Translation in time. Even usual at home, an activity is not always performed 
exactly at the same time. That is the reason why we introduce instances of a same 
motif anywhere in the non-pattern time-series. 

6.2.2 Experimental method 

Setting up an experimental process is relevant for evaluating the system's performance at 
two levels: 

1. Quality of the method: that includes to validate each stage of the proposed ap- 
proach and to define appropriate values for the parameters required. Wc arc especially 
interested in defining the sensitivity of each parameter, and the way relevant values 
can be identify and validated. 

2. Quality of the results: once the system is properly set up, the aim is to evaluate the 
performances of the system especially under the strong presence of noise. We aim at 
study the influence on introducing highly noisy instances of given motifs, considering 
all possible noise: variability in values, outliers, stretching and translation in time. 

6.2.3 Performance measures 

Setting up an experimental process includes to define appropriate performance measures. 
In our context, the objective evaluation of the robustness and efficiency of our approach 
to motifs extraction is performed at two levels: (1) Identification of frequent subsequences 
within time-series containing both pattern and non-pattern signals, that is the performance 
of tentative motifs extraction; and (2) Classification of these subsequences into motifs, that 
is the performance of clustering tentative motifs. We then define means of evaluating the 
sensibility and specificity for these two stages of motifs extraction. 

Tentative motifs extraction 

Defining a measure of sensibility and specificity for this stage aims at evaluating the abil- 
ity of this algorithm to (a) well identify as tentative motifs subsequences corresponding to 
instances of motifs (sensibility) , and (b) not define as tentative motifs subsequences corre- 
sponding actually to non-pattern signals (specificity). Sensibility (Se) and specificity (Sp) 
are computed from rates of true/false positive/negative (labeled TP, FP, TN , FN), con- 
sidering one of the two following complementary hypothesis for each point of the original 
sequence: "the point belong to an instance of a motif" and "the point does not belong to 
an instance of a motif" . 



Se = 



TP 



and Sp = 



TN 



TP + FN 



TN + FP 
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Clustering into motifs 

At this stage, the aim is to evaluate the abihty of the algorithm to properly cluster the 
tentative motifs into motifs, that is (a) to gather all the instances of a same motif in one class 
(sensibility) and (b) to gather in one class only some instances of a same motif (specificity) . 
Sensibility and specificity of classification algorithms of N vectors are determined using 
confusion matrix. A confusion matrix C of dimension m x n represents the results of an 
algorithm that clusters N vectors, corresponding a priori to m motifs, in n classes. Each 
value Cij represents to the number of vectors belonging to class j that actually correspond to 
instances of motif i. Considering a confusion matrix, the sum over any row i is the number 
of instances of motif i, and the sum over any column j is the number of vectors in class j. 

Due to the way of extracting the tentative motifs, our context presents some specific 
features in comparison to defining "simple" performance measures of clustering algorithms. 

• The "right" elements to be clustered - that is all and only subsequences corresponding 
effectively to motifs may not be available, depending on the performance of tentative 
motifs extraction. As a consequence, the sum over each row i may be lower than the 
number of instances of motif i, and the sum over each column j of the confusion 
matrix lower than the number of elements in class j. Moreover, an instance of a motif 
may be recognize as more than one tentative motif, so that these sums may also be 
greater than the usual values expected. 

• Because we use unsupervised classification, the number of classes as output of the 
clustering algorithm might be different from the effective number of motifs, that is 
m ^ n. Analysis of clustering performances reported in the literature does not however 
usually consider that issue. 

Considering these assumptions entails the definition of sensibility and specificity as follows: 

• Sensibility: "All the instances of a motif must he gathered in one class, as un- 
segmented subsequences - that is, an instance of a motif is associated to only one 
subsequence of the class. " 

• Specificity: "All the elements of a class must be representative of a same motif, as 
unsegmented subsequences - that is, only one element of a class is associated to each 
instance of the corresponding motif. " 

Missing some motifs instances, recognizing some instances as several subsequences, or fail- 
ing in properly clustering the tentative motifs decrease these performance measures. The 
proposed definitions of sensibility and specificity arc implemented using the concept of en- 
tropy. A null value of entropy represents the perfect order, which should correspond to a 
maximum value, 1, of sensibility and specificity: tentative motifs extraction and clustering 
are perfectly done. Sensibility is related to the well identification and clustering of motifs 
instances, so that it is defined as a measure of entropy over each row i of the confusion 
matrix. Specificity is related to the homogeneous composition of each class, so that it is 
defined as a measure of entropy over each column j of the confusion matrix. These indexes 
could then be roughly defined as: 

Sei = l + -\--y^.log(^\ and% = l + -^- V^./o^f^l (7) 
login) fr[nH ^\mi) log{m) ^ n,- j ^' 
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where m is the number of motifs expected 

n number of classes extracted 

rrii number of instances of motif i 

rij number of elements in class j. 

The values and correspond to the maximum entropy of a system of n and m 

states respectively, so that Sei and Spj values are into [0,1]. 
Using equations 0, the following properties must be verified: 

n m 

y ^ = 1 and y ^ = 1 (8) 

However, in our context, we can miss some instances of motifs, or recognize them as divided 
into several subsequences. Consequently, the values of J2]j=i Cij/rrii and J2^=i might 
be either lower - when missing some instances - or greater - when tentative motifs split up 
- than 1, and we must adapt the formulas ((TJ. 

In order to deal with missing instances, we introduce a notion of "recognition rate" for 
each motif i, pei, and each extracted class j, ppj defined as: 

pe, = ^^1^ and pp, = ^^^^ (9) 

nii rij 

These parameters are used as weights in computing sensibility and specificity, so that the 
values of sensitivity and specificity decrease in case some instances of a motif are not iden- 
tified, or some elements of a class are not representative of a motif's instance. Equations 
(O then become: 

S''' = P^'-(' + T^t^-'«o[^^]\. (1») 



login) J2]=l C: 




Furthermore, to get an appropriate measure to deal with motifs' instances possibly 
discovered as several tentative motifs, we introduce the notion of "split rate" for each 
instance k of motif i, 1/rjik, where rjik is the number of tentative motifs associated to 
instance k of motif i. When filling in the new confusion matrix C , we then do not consider 
adding 1 to Cij when instance k of motif i is recognized in class j, but adding l/rjik to c^j. 
Consequently, J2]j=i '^ij represents the number of instances of motif i that are well identified, 
even if not well classified, and we have J2^=i < "^-i ; and J2iLi c[j represents the number 
of distinct motifs instances represented in class j. This last value is not necessarily an 
integer because an element of a class can represent only one part of a motif's instance, the 
other part(s) being classified in other classes. 

Sensibility and specificity are then defined using equations similar to H1U|) and 
replacing Cij by c^j. We also re-defined the "recognition rates" from Q to take into account 
the new definition of the confusion matrix, C', as: 



^J=l »J - ^ 



pei = — f- — T and ppj 



where is the number of instances of motif i not identified, in any class 

rij number of elements in class j representative of any motif. 



33 



Sequencer 



11^1 



Toilets 
Batluoom 
g Bethoom 
^Lhiiigioom 
Kitchen 



L\Tiia clown 

B Sitting 

6 

pL, Standing 
'.^ 20 
>:> ^ 10 




Is 



no 

= 120 
^ 100 




mil 




11° 3 



n°4 



11=6 




mill Mill. 






ii°8 




':mm 



Time 

Sequencer n^'S) ii'^lO u'-'ll ii'-12 ii-13 ii'^14 if^L^ n46 ii°n ii'^l^ 

Toilets 

^ Batluoom 

g Bedroom 

^Lhiugroom 

ICitclieii 



tu Lying down 
B Sittiiia 
Staiicliim 



II 




1-10 
120 
100 

SO 




ilL 






OlM 





:zz]iii 



Time 




OOOIIIZ 



it 



Figure 17: Experimental sequences. 

From top to bottom, sequences 1 to 8 should be considered close to sequence (class 0), and 
sequences 9 to 18 far from this reference sequence (class 1). 



We also introduce an index of segmentation of motifs recognition, A, according to equation 
1121 A maximum value of 1 means that the whole motifs instances are well recognized. 

, m n ' 

A = ^EE^ (12) 
n ■ m Ci^ 

1=1 ]=i 'J 

We can at last define mean values of sensibility and specificity, as: 

m 1 " 

Se = — Sei and Sp = — Spj . 

m ^ n ^ 

1=1 j=i 

6.3 Quality of the method 

Given the high complexity of the problem and the need for several successive steps of 
analysis, we integrate several levels of validating the proposed approach, that is in consid- 
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Figure 18: Distances between sequence and the other experimental sequences, 

from either raw or pre-processed data, and using DTW and LCSS distances. Classes and 1 
correspond to the expected classification. 



eration to the following steps: (1) Defining a similarity measure, (2) Representation, and 
(3) Mining operations. Validation is performed from both experts and mathematical and 
statistical analysis, depending of whether or not appropriate data are available for objective 
comparisons. 

6.3.1 Similarity measure 

The approach defined for computing the distance between multivariate and heterogeneous 
time-series is especially experimented and validated under the strong presence of noise. 
We compare the performance of our approach, which involves LCSS, to the use of DTW 
distances. The distance between time-series is expected to generate low values between 
sequences corresponding to the realization of a same activity in same conditions, and higher 
values otherwise. Two experimental sets consist of (see figure [T7)l : 

(1) Sequences to 8 representative of a given activity — getting ready in the 

morning, generated from a reference sequence (sequence 0) by adding noise of three 
types: stretching in time, variability in values, interruptions (consecutive outliers). 

(2) Sequences 9 to 18 representative of other activities like sleeping, having a meal, 

having a quiet activity, including three sequences (sequences 16 to 18) corresponding 
to the reference activity (same moves) but carried out in bad conditions, that is (16) 
slowness, (17) long worrying interruptions, and (18) high values of the mean heart rate. 
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Figure 19: Pairs of points considered as similar when computing LCSS and DTW 
distances. 



These abnormal behaviors may be detected if sequences 16 to 18 are not considered 
as representative of sequence 0. 

The experimental process aims at classifying these sequences using a threshold on the dis- 
tance to sequence 0. An appropriate distance may be able to properly discriminate the 
sequences: 1 to 8 associated to class 0, and 9 to 18 to class 1. We use both DTW and LCSS 
distances for comparison (see |15j for a clear review of DTW principle), and in each case 
the distances are computed from both raw and pre-processed data — that is sampling rate 
reduction to speed-up the computation, and filtering to remove some noise. Preliminary 
experimentations were required to define relevant values for the LCSS parameters {e,5) in 
the context of our application. 

The classification results are presented on figure IT51 As a general comment, we notice 
that DTW distances are really lower than LCSS ones, due (1) to different orders of com- 
putation — 1 for LCSS and 2 for DTW, and (2) to possible multiple associations of any 
point using DTW, so that the distance may remain quite low. 

The superiority of LCSS over DTW is pointed out by the results matching the expected 
classification only in the case of using LCSS. Using DTW distance fails in properly classi- 
fying critical sequences 16 and 18. The behavior of both LCSS — 6 set with no restriction 
in time for associating points, as it is using DTW — and DTW when comparing sequences 
and 16 is illustrated on figure IT^ DTW allows for multiple associations, and all points 
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Figure 20: Distances observed according to selected values of clcss- 
These results are obtained when classifying sequences 1 to 18 in class or 1 depending on whether 

or not close to reference sequence according to a given distance threshold. The graph presents 
the mean distances observed for class and 1 between each sequence of the class and the reference 
sequence 0. The minimum and maximum decision thresholds required for a proper classification of 
sequences in class or 1 is also plotted, such as the mean distance observed between classes. In 
these samples Slcss has been selected as not restrictive. 

• The left graph shows the variation of these distances when the similarity threshold on values 

- ^Lcss - is set between to 1, and Slcss = 40 minutes. 

• The right graph shows these variations for increasing values of the similarity threshold in time 

- Slcss (in minutes) - and clcss = 0.3. 



must be matched, based on a minimum distance criterion. Then, because the sequences of 
moves and postures are very close, the poor number of points corresponding to low activity 
levels and mean heart rate in sequence are associated to the large number of such points 
in sequence 16, and reciprocally for high values of activity levels and mean heart rate. That 
results in a low number of pairs corresponding to large distances, so that the distance be- 
tween the two sequences remains low. The strength of LCSS is to base the similarity of 
points on a threshold criterion, allowing outliers, and excluding overlapping pairs. A higher 
LCSS distance is even obtained for sequence 16 by restricting the value of S. 

Wc also notice that the two classes arc better separated when computing the distances 
from the preprocessed data. Filtering the sequences indeed results in removing at least part 
of the variability in the values. 

Then, the key parameters in defining the similarity measure are as follows: 

• Msiximum temporal difference (Slcss) between two points so that they can be 
considered as similar. 

• MELximum difference in the values {eicss) of two points so that they can be 
considered as similar. 
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Defining 6 loss and e loss restrains the selection of a relevant decision threshold to properly 
classify sequences, as illustrated on figure QUI For instance, increasing values of eicss 
result in decreasing distance measures, so that an appropriate decision threshold can be 
defined lower and lower. There is however a stabilization of values as eicss ranges over 
0.5. Decreasing the value of Slcss also results in increasing values of distances. 

6.3.2 Representation 

The stage of representation is validated at two levels: 

1 . Checking with experts that the representation step preserves within the sequences the 
trends of variation they consider as important in identifying the behavioral profile of 
a person. 

2. Analyzing the influence of each step of representation in preserving these fundamental 
trends according to the purpose of study. 

The step of representing temporal sequences includes preprocessing, discretization and ag- 
gregation. Validation is performed using intuitive knowledge from experts. The representa- 
tion steps are anyway highly restrained by the context and purpose of the decision issue, so 
that there are not many choices in selecting appropriate values for the parameters involved 
at this stage. Considering several experimentations with the few possible values of parame- 
ters, the representation of raw time-series (generated by simulation) is intuitively evaluated 
according to the purpose of preserving the global trends while removing insignificant varia- 
tions in terms of studying the activities of daily living of a person. We then define the key 
parameters as follows: 

• Filter type and length: we use a mean weighted filter, so that highlights the global 
trends while preserving the peaks in the values. 

• Rate of temporal reduction: some analysis show that a rough temporal reduction 
of initial time-series may remove critical points, especially peaks in the values. Any- 
way, at the end of the representation step, the aggregation of time-series produces 
approximately the same number of symbols representing the original sequence, what- 
ever a temporal reduction. We then decide to rely on meaningfully aggregating the 
successive vectors to reduce the sequences length. 

• Number of discretization intervals: the meaningful number of discretization in- 
tervals for values of quantitative parameters may be approximately defined by experts. 
Considering the activity level and mean heart rate, the intuitive qualification of pos- 
sible variations as "resting", "low", "moderate", and "high" guide us to define four 
intervals of discretization. This rough idea could be refined considering the system's 
performances. We use k-means algorithm to determine appropriate bounds of discrete 
intervals for each monitored person (see NS.l.lf) . The obtained results are roughly in 
agreement with related academic knowledge, as illustrated in table El 

• Minimum distance threshold: we use of minimum distance of zero as a threshold 
for aggregating successive vectors. That means aggregation in one symbol is allowed 
along subsequences where there is no significant variations of the parameters, so that 
we can intuitively assume that the person is performing a same "action" all along the 
corresponding duration. Increasing the minimum distance does not seem appropriate 
because, especially considering qualitative parameters, a change in a value corresponds 



38 





Academic knowledge 


Experimentation 




Activity 


Mean heart 


Activity 


Mean heart 




level 


rate 


level 


rate 


1 


Rest 


« 65 


< 1.8 


< 75 


2 


Very light 


< 75 


1.8 to 3.8 


75 to 92 


3 


Light 


75 to 100 


3.8 to 7 


92 to 104 


4 


Moderate 


100 to 125 


> 7 


104 to 120 



Table 4: Discretization intervals for the activity level (arbitrary unit) and the mean 
heart rate (beat per minute) in comparison with empirical boundaries |25j . 

to a change in the room occupied or the posture, which represents intuitively a change 
in the "elementary action" performed. 

As a consequence, the number of discretization intervals is the only parameter whose value 
could be refine by studying its influence on the system's performances, but the possible 
values are highly constrained anyway. The relevance of considering 4 discrete intervals in 
our context has been then confirmed by experimentations. For further experimentations, 
we then decide to use the most intuitively appropriate values of parameters, that is: mean 
weighted filter, no temporal reduction, 4 intervals of discretization, and a null minimum 
distance threshold for aggregation. 

6.3.3 Mining operations 

Mining operations include feature mining - that is, projections, collision matrix examina- 
tion, and tentative motifs extraction - and clustering. The parameters involved at that 
stage are differently constrained in their values and have specific infiuence on the system's 
performances. 

Due to the experimental context, we have identify three highly constrained parameters, 
so that we can select a priori the most appropriate value. Some experimentations can 
however be performed to possible refine this intuitive choice. 

• Number of symbols considered to define the length of basic subsequences defined 
for projections. This length should correspond to the minimum number of symbols 
defining a motif. Thus, this parameter highly depends on the level of representation of 
the original sequence in terms of successive aggregated vectors - the symbols. In our 
context, that corresponds to the minimum number of "actions" successively performed 
by a person in their activities of daily living, which we have defined intuitively as being 
4 symbols. Selecting a lower value, like 3 symbols, could be appropriate, but, reversely, 
we might miss some motifs' instances. 

• Projection mask defines the number of symbols, as well as the number of parameters 
of each symbol, not considered when comparing basic subsequences after projection. 
That means similar subsequences could differ in these numbers of symbols and com- 
ponents. These parameters are consequently determined according to the rate of noise 
allowed between two similar subsequences. Given that we consider 4 dimensions for 
defining symbols, and 4 symbols in basic sequences, the projection mask is defined 
so that we project 3 symbols and 3 components of these symbols. This choice is val- 
idated by the intuitive idea of low collisions numbers between different sequences in 
the context of observing the mean percentage of collisions between sequences of class 
or class 1 and the reference sequence (see figure ITT)) , as illustrated on table |S1 
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Size 
of projection 


Mean % of collisions 


Median % of collisions 


Class 


Class 1 


Gap 


Class 


Class 1 


Gap 


3x3 


46.1 


8.5 


37.6 


45.5 


3.5 


42.0 


2x3 


56.2 


18.3 


37.9 


61.5 


14.5 


47.0 


3x2 


58.2 


20.3 


37.9 


64.0 


21.0 


43.0 


2x2 


67.5 


34.5 


33.0 


78.5 


39.5 


39.0 



Table 5: Percentage of collisions observed between sequences of class and 1 and the 
reference sequence according to the size of the projection mask. 

The size of the projection mask correspond to the number of projected symbol x the number of 
projected parameters per symbol. 

• Number of projections performed to build the collision matrix. The number of 
projections should ensure that (1) we do not get hazardously a high number of colli- 
sions (specificity), and (2) similar subsequences as defined in our context correspond 
to a high number of collisions (sensibility). With an increasing number of symbols 
defining basic subsequences, dimensionality, and rate of possible noise, then should 
also increase the number of projections performed to get significant results. Given 
that these influence factors are well defined by the context, we can use a number of 
projections appropriate in the worst case anyway. 

Other key parameters used for mining operations strongly influence the system's perfor- 
mance, but cannot be so easily determined : 

• Minimum collisions threshold. This parameter defines the number of collisions 

considered as "significant" in terms of similarity of the corresponding subsequences. 
This threshold should not be too high so that we miss some tentative motifs. Given 
that there are next steps to refine the decision about whether a subsequence is rep- 
resentative of a motif, we prefer selecting "too much" candidates at this stage. We 
should however be careful that the main interest of projections is preserved : not 
examining the collisions between all possible subsequences all together. 

• Maximum distance threshold. This parameter set a upper bound on the actual 
distance between subsequences so that they are considered as similar, and eventually 
as representative of a same motif. A compromise needs to be found between effectively 
considering all subsequences representative of a same behavior as similar, even in the 
presence of noise (sensitivity) , and not including wrong subsequences as representative 
of that behavior (specificity). 

According to the purpose of motifs extraction, and considering the length of representation 
of simulated time-series corresponding to typical activities of a person at home, we decide 
to consider 4 symbols to made up basic subsequences for projections. The mask used for 
projections is fixed of one unit length for both the number of symbols and the number of 
components of each symbol. 

6.4 Quality of the results 

First experimentations are performed using temporal sequences whose structure is known a 
priori. Several instances of a motif randomly selected in simulated sequences are introduced 
with a moderate amount of noise in a "non-pattern" sequence, that is generated from 
random moves. Different types of noise are added to motifs' instances when introduced in 
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Figure 21: A sample of pattern recognition from time-series. 

From left to right, and top to bottom, graphs represent (1) a reference motif, (2) introduced several 
times a day as noisy instances over a "non-pattern" sequence. Then, (3) the next graph represents 
frequent subsequences as extracted and classified all together from the analysis of previously defined 
"pattern" and "non-pattern" sequence, and at last (4) the subsequence defined as representative of 
motif. 
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non-pattern signals, that is: noisy values, interruptions, and stretching in time (see ^6.2. 
A sample of running this experimental process in a noisy context is illustrated through the 
graphes of figure 1^ 

After checking the general performance of the system, we especially test its performances 
in a particularly noisy context (test of sensitivity), such as the results obtained in the 
presence of abnormaly modified motifs' instances (test of specificity). 

6.4.1 General performance results 

The first experimental goal is to study the influence of the key parameters on the perfor- 
mances of motifs extraction, evaluated in terms of sensibility and speciflcity of identifying 
the tentative motifs and classifying them into motifs. Due to the way of selecting tentative 
motifs, that is from discrete and aggregated subsequences, the tentative motifs identifica- 
tion cannot be performed really precisely in terms of their indexes of starting and ending 
(sensitivity and specificity of extraction). Anyway, in our context, the purpose is to identify 
the occurrence of a motif's instance, without need for exact time and duration. Concerning 
classification, getting good performances of the system is much more fundamental. Ideally, 
we need for a "perfect" classification, even if motifs' instances are not precisely identified 
along time, so that we can recognize all the main activities of a person at home. The 
system may fail in well classifying the motifs' instances because of bad similarity measures 
according to the distance threshold. That may be because one of the motif's instance as 
been too roughly defined, including too many "non-pattern" points for instance and miss- 
ing too many "pattern" ones, so that the distance increases. Then, improving the precision 
of tentative motifs identification may be indirectly required to get better performances of 
clustering. 

Critical parameters under study are the collisions and distance threshold, such as thresh- 
olds used for computing similarity measure, that is the maximum difference between two 
similar points in terms of values, eLCSS, and time, 6lcss- A default system configuration is 
then defined after many experimentations with varying values of the key parameters, while 
looking for the best performances in terms of classification results. The graphs of figure 
I2H present the system's performance according to some possible values of these thresholds. 
The results highlight the complexity of selecting appropriate values of these parameters, 
especially because of their relative influence on the system's performances. That is particu- 
larly noticed considering the results associated to varying values of maximum distances 
in values (eLcss) and time {6lcss) defining the similarity measure. 
The collisions threshold has not a strong influence on the system's performances, and 
does not require to be precisely deflned. In that specific case, the only interest in defining a 
collisions threshold lower than the number of projections is to deal with possible imprecision 
in the representation step. Then, if the minimum collision threshold is too high, we might 
miss some motifs' instances. Reversely, if too low, we possibly accept "too much" subse- 
quences as potential tentative motifs, and that then requires a lower distance threshold for 
a proper identification of tentative motifs. 

The distance threshold is clearly more critical since it is responsible for the end decision 
about whether or not a frequent subsequence is a motif. Concerning tentative motifs ex- 
traction, both true positive and false positive rates increase with the decision threshold : 
more motifs' instances are well identified, such as more non-pattern subsequences. On the 
other hand, performances related to the classification task present complex variations of 
true and false positive rates. Over a certain value, increasing the decision threshold gives 
better sensibility indexes ~ more subsequences are considered as similar - but decreases 
specificity ones - in the same time, some subsequences might be considered as hazardously 



42 




Figure 22: Mean performance results in terms of true (sensibility) and false positive 
(1-specificity) rates of the two steps of identifying the tentative motifs and classifying 
them into motifs, and considering moderate amount of noise between motifs' instances. 

We observe individually the influence of each critical parameter in a default configuration of the 
others. 
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similar. Considering lower decision thresholds gives however better performances anyway 
possibly due to higher precision in identifying the motifs' instances, containing few "non 
pattern" vectors, so that similarity measures are much more relevant for classification. 
Consequently, we noticed a close relation between the parameters involved in motifs extrac- 
tion and classification. Then, there is some difficult compromises to be found in defining 
appropriate values of these parameters, especially in contexts where the system needs to 
support different possible noise. 

Generally, according to the results presented in table El the proposed approach for 
motifs extraction gives good results in terms of sensitivity and specificity of both extraction 
and classification of motifs. We however notice that the performances indexes are higher for 
identification than classification, as well as for specificity than sensibility. The system might 
sometimes fail in well identifying the whole motifs' instances, and some motifs' instances 
may consequently be missed in the corresponding class. Perfect classification is however 
possible in some cases. The large variability in the results may partially due to the random 
selection of motifs for each experimentation. Consequently, corresponding subsequences 
might not all be representative of a same level of recurrent behavior. 





Identification 


Classification 


Segmentation 


Indexes 


Se 


Sp 


Se Sp 


A 


Mean 


0.71 


0.92 


0.66 0.79 


0.89 


Standard deviation 


O.fS 


0.07 


0.34 0.26 


0.19 


Perfect indexes 






35% 60% 


70% 


Perfect classificat 


ion 




20% 





Table 6: Mean performance results of motifs extraction in a default system configuration 
and considering moderate amount of noise between motifs' instances. 

The table presents the mean performances in terms of sensibility {Se) and specificity (Sp) of the 
two steps of identifying the tentative motifs and classifying them into motifs, as well as the index of 
segmentation of motifs recognition (A). All indexes fall into [0,1], the maximum values corresponding 
to perfect results. 

6.4.2 Sensitivity test 

The previous study gives an idea of appropriate values for the parameters of motifs ex- 
traction, so that we can observe how adding noise influences the system's performances 
with default values set for extraction parameters. In our experimental context, we should 
support certain amounts of variability in values, stretching in time, and interruptions. 

Analyzing some experimental results, reported on figure 123} show good results for pat- 
tern extraction and classification even in the presence of noise. Increasing amounts of the 
different types of noise effectively degrades sensitiviy indexes. The system as defined using 
default values of extraction parameters appears to remain especially efficient in the pres- 
ence of noisy values or stretching in duration, and less resistant to the presence of long 
interruptions in motif's instances. Anyway, in our experimental context, we do not really 
know at that stage where is the boundary between "normal" and "abnormal" behaviors. 

These results also highlights the complexity of defining appropriate parameters of ex- 
traction to deal with all possible types of noise. On one hand, introducing large amount 
of noise in values or duration mainly requires to increase the values of eicss and 5 loss 
constraining the similarity between points. High rates of noise in values especially need for 
higher eicss-, whereas large possible variations in duration require increasing 5lcss- On 
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Figure 23: Mean performance results in terms of true (sensibility) and false positive 
(1-specificity) rates of the two steps of identifying the tentative motifs and classifying 
them into motifs, and considering varying rates of noise between motifs' instances. 

We observe individually the influence of each type of noise (variability in values, stretching in time, 
interruptions) on the performance measures in a default configuration of the system. 
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the other hand, introducing interruptions in motifs' instances mainly influence the selec- 
tion of appropriate collisions and distance thresholds. That indeed requires to reduce the 
collisions rate while increasing the distance threshold. 

6.4.3 Specificity test 

Another test required in the context of critical situation detection is to check the specificity 
of the system, that is the non-classification of abnormaly modified instances in the class 
corresponding to "normal" ones. Figure OH shows a sample of introducing worrying changes 
in the mean heart rate features, getting to higher values independently of the activity level. 
Table d reports the classification results obtained when introducing successively within a 
"non-pattern" sequence eight "normal" noisy instances of a given motif followed by four 
abnormaly modified ones. We especially notice that abnormal instances are rarely associated 
to the "normal" class. Their recognition rate as "normal" also decreases with increasing 
worrying change rate in behavior. These results validate that the system might detect 
critical situations. 



Change in behavior 


"Normal" 




Worrying 


Worrying change rate 


0% 


10% 


20% 30% 40% 


Recognition rate 


77.5% 


25% 


10% 10% 5% 



Table 7: Results of classifying tentative motifs from a learning sequence containing both 
"normal" and abnormaly modified motifs' instances. 

The above table presents the mean percentage of several types of motifs' instances - "normal" and 
abnormaly modified ones - properly identified and classified in a so called "normal" class. 

6.5 Identifying reccurent behaviors from simulated sequences 

As a step of validation, we propose to observe the results of extracting motifs from a learning 
sequence generated over seven days by our simulation process. Simulated sequences might 
contain subsequences corresponding to recurrent behaviors of the person at home, even if 
we do not have any a priori knowledge about their explicit features. The results reported on 
figure ESI highlight that the extracted motifs can be easily interpreted in terms of possible 
activities of daily living. The frequencies observed are however lower than expected ones, 
possibly due to some imprecision or incorrectness in the simulation process or to a wrong 
adjustment of the learning parameters. That however validates the potential good results 
of the proposed approach for motifs extraction in that experimental context. 

7 Conclusion and perspectives 

In that paper, we have proposed an approach for mining heterogeneous multivariate time- 
series to identify meaningful patterns. Generally, some interesting features of the proposed 
method are the ability to extract motifs from time-series containing both pattern and non- 
pattern subsequences, in a completely unsupervised way, allowing for noisy values between 
motifs' instances, and without the need for large amount of learning data sets - the presence 
of two instances of a frequent pattern is theoretically enough to identify the corresponding 
motif. Furthermore, the proposed approach presents several advantages compared with 
several works in finding time-series motifs. 
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Figure 24: Sample of introducing worrying changes between motifs' instances. 
The motifs' instances are drawn in bold type within a "non-pattern" sequence, including increasing 
rates of abnormal changes. In that sample, values of the mean heart rate are globally increased, 
independently of the activity, by 10 to 40%. 
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Figure 25: Features of recurrent behaviors identified within a learning sequence simu- 
lated for a given person over seven successive days. 

A plausible interpretation of each motif is given in terms of possible activities of daily living. 
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First, we have considered the general case of multivariate and heterogeneous time-series. 
That particularly implies to define (1) an homogeneous representation of time-series to first 
roughly identify frequent subsequences, and (2) a similarity measure appropriate to hetero- 
geneous multivariate time-series, in order to compare precisely these frequent subsequences. 
We have then extended the non-metric distance based on LCSS experimented by Vlachos 
et al. IHS] to that general context. 

Second, we have extended the projection algorithm already experimented by Chiu et al. 

for finding time-series motifs to the case of multidimensional symbols defining discrete 
time-series. This algorithm is indeed particularly suited to strong presence of noise. Our 
context of using this algorithm also differs from [7j in defining symbols of possibly different 
lengths, allowing for stretching in time between discovered motifs' instances. We have 
also proposed a method for extending basic recurrent subsequences to the identification 
of representative subsequences in terms of observing daily living habits. At the end, a 
divisive approach to clustering allows to synthesize this set of recurrent subsequences in 
non-overlapping tentative motifs, then classified into motifs. 

Another specificity of our approach lies in a large scale involved from the level of details 
embedded in raw data to the decision level. That requires to define several levels of extract- 
ing relevant information from the original time-series, up to the decision level. However, the 
decision's purpose considered in our experimental context does not restrict at all the possible 
use of our approach at different temporal scales or levels of details, or for other applications. 
First, the same approach can be used a different temporal scale by simply changing the fre- 
quency of raw data. Second, appropriate values of parameters can be determined to deal 
with other context and purpose of decision. The levels of both representation and mining 
operations can be adapted to a given experimental context. At the representation stage, 
increasing the filter length, the minimum distance threshold for allowing aggregation, and/ 
or reducing the number of discrete intervals entails the extraction of "higher level" motifs, 
and reversely. At the stage of mining operations, changing the values of parameters also 
results in modifying the sharpness of study. For instance we extract more precise frequent 
patterns by reducing the maximum distance threshold, or longer motifs are identified by 
increasing the number of symbols defining basic subsequences for projections. Generally, the 
proposed approach could be appropriate to deal with any application that aims at profiling 
an usual behavior from the observation of any set of complementary parameters. 

First experimentations of the proposed approach for mining heterogeneous multivariate 
time-series give really promising results. The method allows to well identified a large number 
of motifs' instances introduced within non-pattern sequences, even in the presence of noise, 
allowing variability in values, stretching in time, and interruptions. Results also highlight 
the difficulty of selecting appropriate parameters to a given experimental context. Each 
possible type of noise infiuence mainly one type of parameters, which are however closely 
related one to each other. 

Additional experiments may require real data sets to first better define "normality" and 
"abnormality" in terms of the related temporal features, and then especially to characterize 
subsequences representative of recurrent behaviors, so that testing the proposed approach 
may be most relevant. In the context of home health telecare, another step of experimen- 
tation is then to validate it is effectively possible to identify similar behaviors in that way 
from time-series recorded in realistic environments of monitoring a person at home. 
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